Objective

Scraping Video details from a Youtube channel using Youtube API. creating a dataset of scrapped data from Youtube. Analysing Dataset and creating Exploratory data analysis report.

Importing Libraries

Channel Details

def Chanel_detail (youtube,Ch_ID): All_data=[] request = youtube.channels().list(part="snippet,id,statistics,contentDetails,topicDetails,localizations", id=",".join(Ch_ID)) response = request.execute() for i in response["items"]: data={"Chanel_Name":i["snippet"]["title"], "Subscribers":i["statistics"]["subscriberCount"], "Views":i["statistics"]["viewCount"],

          #"likes":i["statistics"]["likeCount"],
          "Total_videos":i["statistics"]["videoCount"],
          "Playlist":i["contentDetails"]["relatedPlaylists"]["uploads"],
          "Topic":i["topicDetails"]["topicCategories"][0]
         }
    All_data.append(data)
    df=pd.DataFrame(All_data)
return(df)

returns channel details

chanel=Chanel_detail(youtube,Ch_ID) i=np.arange(1,(len(chanel["Chanel_Name"]))+1) chanel.set_index(i,inplace=True)

Village Cooking Channel has highest Subscribers_Count and and Viewing count.

Let's Do some visualizations 🏃‍♂️

Poular topic

Most content created in Technology and Food is most viewed Content

Subscriber Count

Village Cooking Channel has large no of subscribers

Who posted maximum Videos

C4ETech English posted highest no of videos among them

Is subscriber count relevent to Views?

Let's see Who has most loyal subscribers

How Hema channel has highest subscriber views ratio. so that may be has most loyal subscribers.

Playlist Detail

def get_video_details(youtube,playlist): video_stats=[] for i in range (0,len(playlist),50): request = youtube.videos().list( part="snippet,statistics,contentDetails,topicDetails", id=",".join(playlist[i:i+50])) response = request.execute() for i in response["items"]: data={"snippet":["title","publishedAt","description"], "statistics":["viewCount","likeCount","commentCount"], "contentDetails":["duration"] } v_data={} v_data["Tittle"]=i["snippet"]["title"] v_data["Published date"]=i["snippet"]["publishedAt"]

        for i in response["items"]:
            v_data={"Tittle":i["snippet"]["title"],
                    "Published date":i["snippet"]["publishedAt"],
                    "Views":i["statistics"]["viewCount"],
                    "likes":i["statistics"]["likeCount"],
                    "Duration":i["contentDetails"]["duration"],


             }
            video_stats.append(v_data)

return(video_stats)

Let's see data types of each column

lets Find Duration of Each Videos

Seperate Shots into Different dataframe

Lets find any other shorts found in videos

Lets add this into shorts dataset

totally there is 71 shorts now far📜

Time to save 🔥

Let's format duration 😈

Lets convert time to Seconds first

Lets Change Minutes & Sec datatypes to int

Now we have published date, Duration at datetime format. And we have extracted published year from Date. Now it's time for some EDA💥

EDA

Yearwise Likes and Views

Most commented video Yearwise